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We sequenced and compared the complete genomes of 22 strains of coronavirus HKU1 (CoV HKU1) 
obtained from nasopharyngeal aspirates of patients with respiratory tract infections over a 2-year period. 
Phylogenetic analysis of 24 putative proteins and polypeptides showed that the 22 CoV HKU1 strains fell into 
three clusters (genotype A, 13 strains; genotype B, 3 strains and genotype C, 6 strains). However, different 
phylogenetic relationships among the three clusters were observed in different regions of their genomes. From 
nsp4 to nsp6, the genotype A strains were clustered with the genotype B strains. For nsp7 and nsp8 and from 
nsplO to nspl6, the genotype A strains were clustered with the genotype C strains. From hemagglutinin 
esterase (HE) to nucleocapsid (N), the genotype B strains were clustered closely with the genotype C strains. 
Bootscan analysis showed possible recombination between genotypes B and C from nucleotide positions 11500 
to 13000, corresponding to the nsp6-nsp7 junction, giving rise to genotype A, and between genotypes A and B 
from nucleotide positions 21500 to 22500, corresponding to the nspl6-HE junction, giving rise to genotype C. 
Multiple alignments further narrowed the sites of crossover to a 143-bp region between nucleotide positions 
11750 and 11892 and a 29-bp region between nucleotide positions 21502 and 21530. Genome analysis also 
revealed various numbers of tandem copies of a perfect 30-base acidic tandem repeat (ATR) which encodes 
NDDEDWTGD and various numbers and sequences of imperfect repeats in the N terminus of nsp3 inside the 
acidic domain upstream of papain-like protease 1 among the 22 genomes. All 10 CoV HKU1 strains with 
incomplete imperfect repeats (1.4 and 4.4) belonged to genotype A. The present study represents the first 
evidence for natural recombination in coronavirus associated with human infection. Analysis of a single gene 
is not sufficient for the genotyping of CoV HKU1 strains but requires amplification and sequencing of at least 
two gene loci, one from nsplO to nspl6 (e.g., pol or helicase) and another from HE to N (e.g., spike or N). 
Further studies will delineate whether the ATR is useful for the molecular typing of CoV HKU1. 


The recent severe acute respiratory syndrome (SARS) epi¬ 
demic, the discovery of SARS coronavirus (CoV), and the 
identification of SARS CoV-like viruses from Himalayan palm 
civets and a raccoon dog from wild-animal live markets in 
mainland China have led to a boost in interest in the discovery 
of novel coronaviruses in both humans and animals (8, 23, 26, 
28, 40, 42). In 2004, a novel group 1 human coronavirus 
(HCoV), NL63, was reported independently by two groups (6, 
34). In 2005, we described the discovery, complete genome 
sequence, clinical features, and molecular epidemiology of a 
novel group 2 human coronavirus, HKU1 (genotype A) (17, 
37-39, 41). This virus has also subsequently been found in 
patients with respiratory tract infections in other countries (1, 
30, 33). Recently, we have also identified a SARS CoV-like 
virus in Chinese horseshoe bats and a novel group 1 corona¬ 
virus in large bent-winged bats, lesser bent-winged bats, and 
Japanese long-winged bats in the Hong Kong Special Admin¬ 
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istrative Region (16, 27). The discovery of SARS CoV-like 
viruses in horseshoe bats was confirmed by another group in 
other provinces in China (19). 

As a result of the unique mechanism of viral replication, 
coronaviruses have a high frequency of recombination (15). 
Their tendency for recombination and high mutation rates may 
allow them to adapt to new hosts and ecological niches. How¬ 
ever, no convincing evidence among human coronaviruses of 
genetic recombination that may have contributed to their abil¬ 
ity to reinfect humans has been documented. In our study of 
the phylogeny of the RNA-dependent RNA polymerase (pol), 
spike (S), and nucleocapsid (N) genes of nine isolates of CoV 
HKU1 recovered from patients with pneumonia, it was discov¬ 
ered that the sequences of the S and N genes fell into two distinct 
genotypes, with seven strains belonging to genotype A and two 
belonging to genotype B (41). On the other hand, for the pol 
gene, one of the two “genotype B” strains as determined by its S 
and N sequences (from patient 8) was clustered with the other 
seven “genotype A” strains (41). Furthermore, the same phenom¬ 
enon was also observed in our subsequent prospective study of 
CoV HKU1-associated respiratory tract infections (17). Based on 
these observations, we suspected that there is an additional CoV 
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TABLE 1. Characteristics of the 22 CoV HKU1 strains used in this study 


Patient characteristic CoV HKU1 characteristic 


Strain 

Source 

Mo/yr of 
detection 

Sex“ 

Age (yr) 

Upper/lower 
respiratory 
tract infection 

Underling 

disease 

Clinical 

outcome 

Genotype 

No. of 

NDDEDWTGD 

repeats 

No. of 
imperfect 
repeats 

N2 

Patient 1 in reference 41 

Mar/03 

F 

35 

Lower 

Absent 

Survived 

B 

ii 

2 

N3 

Patient 2 in reference 41 

Apr/03 

M 

66 

Lower 

Present 

Died 

A 

14 

2 

N1 

Patient 5 in reference 41 

Jan/04 

M 

71 

Lower 

Present 

Survived 

A 

14 

2 

N5 

Patient 8 in reference 41 

Jan/04 

M 

68 

Lower 

Absent 

Survived 

C 

8 

3 

N6 

Unpublished 

Jan/04 

F 

29 

Upper 

Present 

Survived 

A 

2 

1.4 

N7 

Patient 4 in reference 41 

Jan/04 

M 

75 

Lower 

Present 

Survived 

A 

12 

1.4 

N9 

Patient 9 in reference 41 

Mar/04 

F 

83 

Lower 

Present 

Survived 

A 

12 

1.4 

N10 

Patient 10 in reference 41 

Mar/04 

M 

72 

Lower 

Present 

Died 

A 

13 

1.4 

Nil 

Patient 1 in reference 17 

Apr/04 

F 

2 

Upper 

Absent 

Survived 

A 

15 

1.4 

N13 

Patient 2 in reference 17 

May/04 

F 

7 

Upper 

Present 

Survived 

A 

9 

1.4 

N14 

Patient 3 in reference 17 

Jul/04 

M 

84 

Upper 

Present 

Survived 

A 

10 

1.4 

N15 

Patient 4 in reference 17 

Nov/04 

M 

3 

Upper 

Present 

Survived 

B 

15 

1 

N16 

Patient 5 in reference 17 

Nov/04 

M 

87 

Lower 

Present 

Survived 

C 

10 

4 

N17 

Patient 6 in reference 17 

Nov/04 

F 

4 

Upper 

Present 

Survived 

C 

10 

2 

N18 

Patient 7 in reference 17 

Nov/04 

M 

2 

Lower 

Absent 

Survived 

A 

13 

1.4 

N19 

Patient 8 in reference 17 

Dec/04 

M 

19 

Upper 

Absent 

Survived 

A 

11 

3 

N20 

Patient 9 in reference 17 

Dec/04 

M 

3 

Upper 

Absent 

Survived 

C 

10 

4 

N21 

Patient 10 in reference 17 

Dec/04 

F 

9 

Upper 

Present 

Survived 

C 

10 

3 

N22 

Patient 11 in reference 17 

Jan/05 

F 

3 

Upper 

Absent 

Survived 

C 

13 

3 

N24 

Patient 12 in reference 17 

Jan/05 

M 

5 

Upper 

Present 

Survived 

A 

17 

4.4 

N23 

Patient 13 in reference 17 

Feb/05 

M 

4 

Upper 

Present 

Survived 

A 

11 

1.4 

N25 

Unpublished 

Feb/05 

F 

5 mo 

Upper 

Absent 

Survived 

B 

12 

2 


a M, male; F, female. 


HKU1 genotype which has arisen from recombination between 
genotypes A and B of CoV HKU1. 

To test this hypothesis, we performed complete genome 
sequencing on 21 additional strains of CoV HKUl and com¬ 
pared their genomes to the CoV HKUl genotype A strain (38). 
The sites of recombination were identified, and a novel CoV 
HKUl genotype, genotype C, was defined. 

MATERIALS AND METHODS 

CoV HKUl strains. All 22 CoV HKUl strains were isolated from patients with 
respiratory tract infections in Hong Kong in a 2-year period (March 2003 to 
February 2005) (Table 1) (17, 38, 41). 

RNA extraction. Viral RNA was extracted from the nasopharyngeal aspi¬ 
rates of the patients using a QIAamp viral RNA mini kit (QIAGEN, Hilden, 
Germany). The RNA pellet was resuspended in 10 pd of DNase-free, RNase- 
free double-distilled water and was used as the template for reverse tran- 
scription-PCR. 

Complete genome sequencing and genome analysis. The complete genome 
sequence of the CoV HKUl genotype A strain was described previously 
(GenBank accession no. NC_006577) (38). The complete genomes of the other 
21 CoV HKUl strains were amplified and sequenced using the RNA extracted 
from the nasopharyngeal aspirates of the corresponding patients as the template, 
by a strategy described previously (38). The RNA was converted to cDNA by a 
combined random-priming and oligo(dT) priming strategy. The 5' ends of the 
viral genomes were confirmed by rapid amplification of cDNA ends using the 
573' RACE kit (Roche, Germany). Sequences were assembled and manually 
edited to produce final sequences of the viral genomes. The 21 genomes were 
compared to that of the CoV HKUl genotype A strain and were manually 
annotated. 

Phylogenetic-tree construction. The nucleotide sequences for nspl, nsp2, con¬ 
served portions of nsp3 (including papain-like protease 1 [PLl pro ], a member of 
the Appr-l-p processing enzyme family [Alpp], papain-like protease 2 [PL2 pro ], 
and the hydrophobic domain [HD]), nsp4-nspl0, nspl2-nspl6, hemagglutinin 
esterase (HE), S, open reading frame 4 (ORF4), envelope (E), membrane (M), 
and N were extracted from the 22 CoV HKUl genomes. Phylogenetic-tree 
construction was performed using the neighbor-joining method with ClustalX 
1.83. The corresponding nucleotide sequences of human coronavirus OC43 
(GenBank accession no. AY585229) were used as outgroups. 


Bootscan analysis. To perform bootscan analysis, a nucleotide alignment of 
the genome sequences of one genotype A (38), one genotype B (patient 1 of 
reference 41), and one genotype C (patient 8 of reference 41) strain of CoV 
HKUl and one HCoV OC43 strain (GenBank accession no. AY585229) was 
generated by ClustalX, version 1.83, and edited manually. Bootscan analysis was 
performed using Simplot version 3.5.1 (F84 model; window size, 1,000 bp; step, 
200 bp) (20), with the genome sequence of HCoV OC43 as a query. 

Nucleotide sequence accession numbers. The nucleotide sequences of the 21 
additional genomes of CoV HKUl strains (data not shown) have been lodged within 
the GenBank sequence database under accession no. AY884001, DQ339101, 
DQ415896, DQ415897, DQ415898, DQ415899, DQ415900, DQ415901, DQ415902, 
DQ415903, DQ415904, DQ415905, DQ415906, DQ415907, DQ415908, DQ415909, 
DQ415910, DQ415911, DQ415912, DQ415913, and DQ415914. 

RESULTS 

Complete genome sequence, genome organization, phyloge¬ 
netic analysis, and genotypes. The sizes of the genomes of the 
22 CoV HKUl strains ranged from 29,295 to 30,097 nucleo¬ 
tides. The G+C contents of all 22 genomes are 32%. The 
overall genome organizations of the 22 CoV HKUl strains 
were the same (Fig. 1A). 

Phylogenetic trees using the nucleotide sequences of genes 
for putative proteins and polypeptides (nspl, nsp2, conserved 
portions of nsp3 [PLl pro , Alpp, PL2 pro , and HD], nsp4-nspl0, 
nspl2-nspl6, HE, S, ORF4, E, M, and N) of the 22 CoV 
HKUl strains were constructed and are shown in Fig. IB. In 18 
of the 24 trees, the 22 CoV HKUl strains fell clearly into three 
clusters, named genotype A (13 strains), genotype B (3 
strains), and genotype C (6 strains). The exceptions are the five 
trees constructed using nspl, nsp2, PLl pro , PL2 pro , and HD, in 
which the differences among the sequences were too small, and 
the nsplO tree, in which two genotype A strains, N1 and N3, 
were clustered with genotype C strains. 

The three genotypes exhibited different relationships to each 
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FIG. 1. Genome organization and phylogenetic analysis of CoV HKU1. (A) Genome organization of CoV HKU1. The regions of the genome 
used for phylogenetic tree construction are labeled. (B) Phylogenetic analysis of nspl, nsp2, conserved portions of nsp3 (PLl pro , Alpp, PL2 pro , and 
HD), nsp4-nspl0, nspl2-nspl6, HE, S, ORF4, E, M, and N of the 22 CoV HKU1 genomes. The trees were constructed by the neighbor-joining 
method using Jukes-Cantor correction and bootstrap values calculated from 1,000 trees. Seven hundred forty, 1,821, 655, 333, 895, 1,263, 1,488, 
909, 861, 276, 582, 330, 411, 2,783, 1,809, 1,563, 1,125, 900, 1,276, 4,126, 330, 253, 687, and 1,358 nucleotide positions in nspl, nsp2, PLl pro , Alpp, 
PL2 pro , HD, nsp4, nsp5, nsp6, nsp7, nsp8, nsp9, nspIO, nspl2, nspl3, nspl4, nspl5, nspl6, HE, S, ORF4, E, M, and N, respectively, were included 
in the analysis. The scale bar indicates the estimated number of substitutions per 50 or 100 nucleotides as indicated. The corresponding nucleotide 
sequences of HCoV OC43 were used as the outgroups. A, genotype A; B, genotype B and C, genotype C. 
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FIG. 1— Continued. 


other in different regions of their genomes. From nsp4 to nsp6, 
the genotype A strains were clustered with the genotype B 
strains, but for nsp7 and nsp8, the genotype A strains were 
clustered with the genotype C strains. From nsplO to nspl6, 
the genotype A strains were clustered closely with the genotype 
C strains, with high bootstrap values, but from FIE to N, the 
genotype B strains were clustered closely with the genotype C 
strains, with bootstrap values of 1,000 in all cases. No associ¬ 
ation was observed between the genotypes and the time of 
detection or the age, sex, clinical disease, presence of under¬ 
lying disease, or outcome of the patients (Table 1). 

The putative transcription regulatory sequence motif, 5'-A 
AUCUAAAC-3' (as in mouse hepatitis virus [MHV] and bo¬ 
vine coronavirus) (22) or, alternatively, 5'-UAAAUCUAAA 
C-3', that was found at the 3' end of the leader sequence and 
precedes each translated ORF except ORF5 described to oc¬ 
cur in the genome of the CoV HKU1 genotype A strain, was 
also present in all of the other 21 CoV HKU1 genomes. On the 
other hand, the sequence of the putative internal ribosomal 
entry site (IRES) (32) for the ORF of the envelope protein in 
the genomes of all three CoV HKU1 genotype B strains and all 
six genotype C CoV FIKU1 strains were UUUUAUCGCU 


UGG, instead of AUUUAUUGUUUGG in all 13 CoV HKU1 
genotype A strains, although both sequences were similar to 
the IRES element, UUUUAUUCUUUUU, in MHV (10). 

The 22 genomes differed in their numbers of tandem copies 
of the 30-base acidic tandem repeat (ATR) in the N terminus 
of nsp3 inside the acidic domain upstream of PLl pro (Tables 1 
and 2). All 22 genomes had tandem copies of a perfect 30-base 
repeat which encodes NDDEDVVTGD and various numbers 
of imperfect repeats. The median number of tandem copies of 
the perfect 30-base repeat was 11.5 (range, 2 to 17), and the 
median number of imperfect repeats was 2 (range, 1 to 4). All 
of the 10 CoV HKU1 strains with incomplete imperfect re¬ 
peats (1.4 and 4.4) belonged to genotype A. 

Bootscan analysis. Bootscan analysis showed that from the 
5' end of the genome to position 12000, there could be a 
number of possible recombination sites in the genomes of the 
three genotypes (Fig. 2). Right upstream to position 11500, 
high bootstrap support for clustering between the CoV HKU1 
genotype A strain and the CoV HKU1 genotype B strain was 
observed. From positions 13000 to 21500, high bootstrap sup¬ 
port for clustering between the CoV HKU 1 genotype A strain 
and the CoV HKU1 genotype C strain was observed. From 






























































































7140 


WOO ET AL. 


J. Virol. 


TABLE 2. Amino acid sequences of acidic tandem 
repeats of the 22 CoV HKU1 strains" 


Genotype Strain 


Amino acid sequence" 


A 


B 


C 


N1 (NDDEDVVTGD) 14 (NNDEEIVTGD) (NDDQIWTGD) 

N3 (NDDEDVVTGD) 14 (NNDEEIVTGD) (NDDQIWTGD) 

N6 (NDDEDWTGD).(NDDD) (NDDQIVVIGD) 

N7 (NDDEDWTGD) 12 (NDDD) (NDDQIVVIGD) 

N9 (NDDEDVVTGD) 12 (NDDD) (NDDQIVVIGD) 

N10 (NDDEDVVTGD) 13 (NDDD) (NDDQIVVIGD) 

Nil (NDDEDWTGD) 15 (NDDD) (NDDQIVVIGD) 

N13 (NDDEDWTGD)g(NDDD) (NDDQIVVIGD) 

N14 (NDDEDVVTGD) 10 (NDDD) (NDDQIVVIGD) 

N18 (NDDEDVVTGD) 13 (NDDD) (NDDQIVVIGD) 

N19 (NDDEDWTGD) 10 (NNDEEIVTGD) (NDDEDVVTGD) 1 (NNDEEIVTGD) (NDDQIWTGD) 

N23 (NDDEDWTGD) n (NDDD) (NDDQIVVIGD) 

N24 (NDDEDWTGDJjfNDDEHWTGD) (NDDEHVVTGD) (NDDEDWTGD) 9 (NDDEHWTGD) 

(NDDEDWTGD) 7 (NDDD) (NDDQIVVIGD) 

N2 (NDDEDWTGD) n (NDDEEIVTGD) (NDDQIWTGD) 

N15 (NDDEDVVTGD) 1S (ND-QIVVTGD) 

N25 (NDDEDVVTGD) 12 (NDDEEIVTGD) (NDDQIWTGD) 

N5 (NDDEDWTGD) 8 (NNDEDVVTGD) (NNDEESVTGD) (NDDQIWTGD) 

N16 (NDDEDVVTGD) 10 (NNDEDWTGD) (NNGEDWTGD) (NNDEESVTGD) (NDDQIWTGD) 

N17 (NDDEDVVTGD) 10 (NNDEESVTGD) (NDDQIWTGD) 

N20 (NDDEDVVTGD) 10 (NNDEDWTGD) (NNGEDWTGD) (NNDEESVTGD) (NDDQIWTGD) 

N21 (NDDEDWTGD) 10 (NNDEDWTGD) (NNDEESVTGD) (NDDQIVTGD) (NDDQIWTGD) 

N22 (NDDEDVVTGD) 13 (NNDEDWTGD) (NNDEESVTGD) (NDDQIWTGD) 


" The amino acids underlined denote those that are different from the NDDEDVVTGD tandem repeats. 


position 22500 to the 3' end of the genome, high bootstrap 
support for clustering between the CoV HKU1 genotype B 
strain and CoV HKU1 genotype C strain was observed. These 
findings indicate that recombination has possibly taken place 
between nucleotide positions 11500 and 13000, corresponding 
to the nsp6-nsp7 junction, and between nucleotide positions 
21500 and 22500, corresponding to the nspl6-HE junction. 

Comparative sequence analysis of the nsp6-nsp7 junction 
and nspl3-HE gene junction. Since both phylogenetic trees 
and bootscan analysis showed that there was a possible recom¬ 
bination site at the nsp6-nsp7 junction and the nspl6-HE gene 
junction, multiple alignments among the nucleotide sequences 
of the 22 genomes were performed to ascertain the exact sites 
of recombination. 

Upstream of nucleotide position 11750 of the CoV HKU1 
genotype A genome (227 bases before the end of nsp6), there 
was high nucleotide identity between the sequences of the CoV 
HKU1 genotype A and genotype B strains, whereas down¬ 
stream to nucleotide position 11892 of the CoV HKU1 geno¬ 
type A genome (85 bases before the end of nsp6), there was 
high nucleotide identity between the sequences of the CoV 
HKU1 genotype A and genotype C strains (Fig. 3). This indi¬ 
cates that the site of crossover was probably within a 143-bp 
region between nucleotide positions 11750 and 11892. 

Upstream of nucleotide position 21502 of the CoV HKU1 
genotype A genome (249 bases before the stop codon of 
ORFlab), there was high nucleotide identity between the se¬ 
quences of the CoV FIKU1 genotype A and genotype C strains, 
whereas downstream of nucleotide position 21530 of the CoV 
FIKU1 genotype A genome (221 bases before the stop codon 
of ORFlab), there was high nucleotide identity between the 
sequences of the CoV FIKU1 genotype B and genotype C 


strains, including a 13-bp insertion just downstream of the stop 
codon of ORFlab (Fig. 4). This indicates that the site of 
crossover was probably within a 29-bp region between nucle¬ 
otide positions 21502 and 21530. 

DISCUSSION 

This is the first time that evidence for natural recombination 
is documented for coronavirus associated with human infec¬ 
tion. Coronaviruses are unique in having a high frequency of 
homologous RNA recombination, as a result of random tem¬ 
plate switching during RNA replication, thought to be medi¬ 
ated by a “copy choice” mechanism (2, 4, 13, 14, 21, 35). In 
feline coronavirus (FCoV), it has been documented that FCoV 
type II strains originated from a double recombination be¬ 
tween FCoV type I and canine coronavirus, and the site of 
recombination has been pinpointed to a region of about 50 
nucleotides in the M gene by multiple alignment (9). As for 
recombination between different strains of MHV, in vitro stud¬ 
ies have shown variations in both sites and rates of recombi¬ 
nation, with the S gene having a frequency threefold that of the 
polymerase gene (7, 21). In the present study, by comparing 
the sequences of 22 complete genomes of CoV HKU1 strains, 
we documented that major recombination has occurred among 
the three CoV HKU1 genotypes. Both phylogenetic and bootscan 
analysis showed that the nucleotide sequences of the six geno¬ 
type C strains were almost identical to those of the 13 genotype 
A strains from nsplO to nspl6 (Fig. IB and 4). Interestingly, 
the topologies of the phylogenetic trees changed dramatically 
starting from the HE gene. From HE to N, the nucleotide 
sequences of the six genotype C strains were almost identical to 
those of the three genotype B strains (Fig. IB). This is also in 
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FIG. 2. Bootscan analysis of the CoV HKU1 genomes. Bootscanning was conducted with Simplot version 3.5.1 (F84 model; window size, 1,000 
bp; step, 200 bp) on a gapless nucleotide alignment, generated with ClustalX, with the genome sequence of HCoV OC43 (GenBank accession no. 
AY585229) as the query sequence. The dashed line denotes CoV HKU1 genotype A (NC_006577), the solid line denotes CoV HKU1 genotype 
B (AY884001), and the dotted line denotes CoV HKU1 genotype C (DQ339101). 


line with results of bootscan analysis, suggesting recombination 
between genotypes A and B, giving rise to genotype C (Fig. 2). 
Multiple alignments of the nucleotide sequences of the 
nspl6-F!E regions of the three genotypes confirmed our sus¬ 
picion, and results localized the site of recombination to a 
stretch of 29 nucleotides in nspl6, just upstream to the stop 
codon of ORFlab (Fig. 4). This is in keeping with the finding 
that the putative IRES of all the genotype B and genotype C 
strains were the same but different from those of the genotype 
A strains, as it is located downstream of HE. In addition to the 
recombination site in nspl6, there was another one at the end 
of nsp6, also evidenced by a shift in clustering in the phyloge¬ 
netic trees, bootscan analysis, and multiple-alignment results 
(Fig. IB, 2 and 3). In contrast to the nspl6 recombination site, 
recombination has occurred between genotypes B and C in this 
region, giving rise to genotype A. Furthermore, as shown in 
bootscan and phylogenetic analyses, additional recombination 
events might have occurred in ORFlab upstream of nsp5 (Fig. 
IB and 2). However, due to the relatively small variations in 
the sequences among the three genotypes, these putative re¬ 
combination sites were difficult to ascertain with multiple 
alignments. 

A novel genotype, genotype C, of CoV HKU1 is defined. It 
has been well known that recombination is an important mech¬ 


anism for the generation and evolution of virus genotypes (12, 
29, 31). In our previous study, we showed that seven of the nine 
CoV HKU1 strains were of genotype A and one of the nine 
strains was of genotype B by pol, S gene, and N gene sequence 
analysis (41). In the present study, we showed that the latter 
half of the genomes of the six genotype C strains probably 
represents a result of recombination between genotypes A and 
B. Analysis of the complete genomes of more CoV HKU1 
strains from other countries will reveal the relative prevalences 
of the different genotypes in different localities. From the re¬ 
sults of the present study, no association was observed between 
the genotypes and clinical characteristics of the patients. Fur¬ 
thermore, amplification and sequencing of a single gene is not 
sufficient to define the genotype of CoV HKU1. It would 
require amplification and sequencing of at least two gene loci, 
one from nspIO to nspl6 (e.g., pol or helicase) and another 
from HE to N (e.g., S or N). 

The origin and function of the ATR located inside the acidic 
domain upstream of PLl pro , unique to CoV HKU1, remain 
enigmatic. Significant variations were observed among the 
ATRs of CoV HKU1 strains. Only two pairs of CoV HKU1 
strains (N1 and N3, N7, and N9) possessed the same nucleo¬ 
tide sequence in their ATRs. No relationship was found be¬ 
tween the number of repeats and the genotype or virulence of 
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FIG. 3. Comparative sequence analysis of the nsp6-nsp7 junction. Multiple alignment of the nucleotide sequences of CoV HKU1 genotypes A, B, and C. In CoV 
HKU1 genotype B and CoV HKU1 genotype C, only the nucleotides differing from those in CoV HKU1 genotype A are depicted. The nucleotides in CoV HKU1 
genotype C that are the same as those in CoV HKU1 genotype A but different from those in CoV HKU1 genotype B are highlighted in black, and those in CoV HKU1 
genotype B that are the same as those in CoV HKU1 genotype A but different from those in CoV HKU1 genotype C are highlighted in gray. The putative template 
switching region is underlined and bold. The first (TCA) and last (CAG) codons of nsp7 are also underlined. The arrows denote positions with nucleotide polymorphism 
(at position 11414, N16, N17, N20, N21, and N22 [genotype C] were T instead of C; at 11422, N16 and N20 [genotype C] were C instead of T; at 11449, N25 [genotype 
B] was C instead of T; at 11528, N16, N17, N20, and N22 [genotype C] were C instead of T; at 11740, N6, N7, N9, N10, Nil, N13, N14, N18, N23, and N24 [genotype 
A] were T instead of C; at 12095, N6 and N7 [genotype A] were T instead of C; at 12140, N23 and N24 [genotype A] were C instead of T; at 12367, N15 and N25 
[genotype C] were A instead of G; and at 12400, N6, N7, N9, N10, Nil, N13, N14, N18, N23, and N24 [genotype A] were C instead of T). 
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FIG. 4. Comparative sequence analysis of the nspl6-HE gene junction. Multiple alignments of the nucleotide sequences of CoV HKU1 genotypes A, B, 
and C. In CoV HKU1 genotype A and CoV HKU1 genotype B, only the nucleotides differing from those in CoV HKU1 genotype C are depicted. The 
nucleotides in CoV HKU1 genotype C that are the same as those in CoV HKU1 genotype A but different from those in CoV HKU1 genotype B are 
highlighted in gray, and those in CoV HKU1 genotype C that are the same as those in CoV HKU1 genotype B but different from those in CoV HKU1 
genotype A are highlighted in black. The putative template switching region is underlined and bold. The stop codon of ORFlab (TAG) and the start codon 
of the HE gene (ATG) are also underlined. The arrows denote positions with nucleotide polymorphisms (at 21297, N6, N7, N9, N10, Nil, N13, N14, N23, 
and N24 [genotype A] were T instead of G; at 21429, N6 [genotype A] was C instead of T; at 21576, N15 [genotype B] was C instead of T; at 21908, N15 
and N25 [genotype B] were G instead of A; and at 21949, N14 was T instead of C). 
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the strains. We speculate that this “independent evolution” of 
the number of repeats was due to the random expansion or 
deletion of part of the repeat region during the process of viral 
replication as a result of inaccurate replication by the viral 
polymerase or recombination between the repeat regions of 
different CoV HKU1 strains, a phenomenon widely observed 
in tandem repeats of genomes in all domains of life (3, 24). On 
the other hand, the sequence of the imperfect repeats seemed 
to coevolve with the rest of the genomes, most notably that all 
10 CoV HKU1 strains with incomplete imperfect repeats 
(NDDD) were of genotype A (Tables 1 and 2). This could be 
due to the deletion of part of a repeat in one genotype A strain 
and subsequent expansion or deletion of whole repeats in its 
descendants. Further studies will delineate whether this ATR 
is useful for the molecular typing of CoV HKU1 strains. 

This high frequency of recombination has resulted in the 
generation of a high diversity of coronaviruses in different 
animals. Before the SARS epidemic in 2003, a total of 19 
coronaviruses were known, including 2 human, 13 mammalian, 
and 4 avian coronaviruses. After the SARS epidemic, within a 
short period of 3 years, 20 additional novel coronaviruses were 
described (5, 6, 11, 16, 19, 25, 26, 34, 36, 38, 43). These include 
3 human coronaviruses, 11 mammalian coronaviruses, and 6 
avian coronaviruses. Notably, there was a recent discovery of at 
least eight different species of coronaviruses in bats in Hong 
Kong, including SARS CoV-like viruses and a probable novel 
subgroup, group 2c, of coronavirus (16, 43). The high fre¬ 
quency of recombination in such a high diversity of coronavi¬ 
ruses may easily result in the generation of novel coronavirus 
species or genotypes that can cross host species barriers, lead¬ 
ing to major zoonotic outbreaks with disastrous consequences. 
The potential of generation of novel species leading to zoo¬ 
notic outbreaks and major consequences is analogous to the 
situation of avian and human influenza epidemiology, although 
the mechanism of generation of novel types and variants is by 
reassortment, which is different from recombination in coro¬ 
naviruses (18, 44). Amplification of conserved regions in coro¬ 
naviruses using RNA extracted from various animal specimens 
will lead to the discovery of more coronaviruses and subse¬ 
quent complete genome sequencing, and comparative genome 
analysis will reveal the intricate relationships among the vari¬ 
ous coronaviruses. 
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